LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-09-2010, 06:49 AM   #16
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17

colucix, sorry about the confusion, your result is spot-on - see #13.

I am not interested in the interim files. They were just a cludge for my 4-step solution. The only thing I need is appending the new ids to 2 files (as per my 1st post).

I need to sit down and work through your awk. I'd like to understand how it does it so I know better next time.
 
Click here to see the post LQ members have rated as the most helpful post in this thread.
Old 11-09-2010, 07:19 AM   #17
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by hashbang#! View Post
colucix, sorry about the confusion, your result is spot-on - see #13.
Ok. Just seen the modification.

Quote:
Originally Posted by hashbang#! View Post
I need to sit down and work through your awk. I'd like to understand how it does it so I know better next time.
Just take in mind that FILENAME is an internal variable storing the name of the file currently parsed. When you pass multiple filenames as argument, awk processes all of them in sequence and the FILENAME variable changes accordingly. For this reason, we have two rules in the code: the first one si executed for all the ids?????? file, the second one only for the idsmore file.

If something is still not clear, feel free to ask. FYI, my only and unique awk reference is the GNU official guide: http://www.gnu.org/software/gawk/manual/.
 
Old 11-09-2010, 07:34 AM   #18
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by H_TeXMeX_H View Post
Can you post the code to generate those, because I can't imagine in my head what needs to be done.
You can simply add more print statements at your pleasure. Since the code does not require sorting (as per the first lines of the OP's script), there is no need for the sorted temporary files... or were you really interested in how to perform sorting in awk?
Code:
FILENAME != "idsmore" {
  _[$0] = ""
}
FILENAME == "idsmore" {
  if (! ( $1 in _ )) {
    print $1 >> ( "ids" $2 $3 )
    print $1 >> ( "ids" $2 $3 ".log")
    print $1 >> "idsmissing"
    print >> "idsmissing_dated"
  }
}
 
Old 11-09-2010, 07:45 AM   #19
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
I was asking about the test files. I cannot write a solution without test files. You said you somehow made 5 files containing 56000 different ids each (total 280000) and a file "idsmore" containing 300000 different ids, how ? I don't really need that many. Or, if the problem is solved, just forget about it.
 
Old 11-09-2010, 08:33 AM   #20
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
colucix - got it. Ingenious!
 
Old 11-09-2010, 08:48 AM   #21
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
@H_TeXMeX_H

Ops, sorry... I totally misunderstood your post. I generated 280000 numbers between 1 and 1000000 using the following awk code:
Code:
BEGIN {
  srand()
  do {
    num = 1 + int(rand() * 999999)
    if (! (num in _)) {
       print num
       _[num] = ""
       count++
    }
  } while ( count < 280000 )
}
then I cutted the output in five pieces using the split command and finally I added 20000 more numbers (plus dates) to generate idsmore. But for testing purposes you may also generate numbers in sequence between 1 and 280000 using seq.
 
1 members found this post helpful.
Old 11-09-2010, 08:49 AM   #22
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
ok, thanks.
 
Old 11-09-2010, 03:50 PM   #23
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,782

Rep: Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083Reputation: 2083
Quote:
Originally Posted by hashbang#! View Post
Am I understanding this correctly, that grep -F speeds up the operation because it does not try to interpret the patterns as regular expressions?
Yes, when grep knows that the patterns aren't regular expressions it can use the hash table approach, suggested by some posters here, internally.

Quote:
I get the same result without -F but it crawls. fgrep is a fraction of a second faster than grep -F.
You have to expect some random variation between different runs of the same program.
Quote:
egrep is the same as grep -E. fgrep is the same as grep -F. Direct invocation as either egrep or fgrep is deprecated, but is provided to allow historical applications that rely on them to run unmodified.
from http://www.manpagez.com/man/1/grep/
 
Old 11-09-2010, 04:14 PM   #24
hashbang#!
Member
 
Registered: Aug 2009
Location: soon to be independent Scotland
Distribution: Debian
Posts: 120

Original Poster
Rep: Reputation: 17
ntubski, thanks for the info on grep. Changed my scripts accordingly.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: KDE Reaches 1,000,000 commits in its Subversion Repository LXer Syndicated Linux News 0 07-21-2009 03:50 AM
LXer: 24,000,000 Google hit for I HATE WINDOWS LXer Syndicated Linux News 0 05-04-2007 03:33 PM
LXer: SugarCRM Announces 1,000 Customers and 1,000,000 Open Source Downloads as Momentum for Open Source Applications Grows LXer Syndicated Linux News 0 12-19-2006 05:33 AM
1,000,000,000 PCs by 2010 masand Linux - News 4 11-01-2004 01:55 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration